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We formulate the solution counting problem within the framework of inverse Ising problem and 
use fast belief propagation equations to estimate the entropy whose value provides an estimate on 
the true one. We test this idea on both diluted models (random 2-SAT and 3-SAT problems) and 

■ fully-connected model (binary perceptron), and show that when the constraint density is small, this 

■ estimate can be very close to the true value. The information stored by the salamander retina under 
, the natural movie stimuli can also be estimated and our result is consistent with that obtained by 

■ Monte Carlo method. Of particular significance is sizes of other metastable states for this real 

■ neuronal network are predicted. 
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I. INTRODUCTION 



Counting the number of solutions for random constraint satisfaction problems is a very important and nontrivial 
problem which belongs to #P-complete class in computational complexity [l[ and is much harder than determining 
. whether a random formula has any solutions. In practice, we can only sample a very small part of a huge solution space 
which contains an exponential number of solutions. However, can we predict the number of solutions in the whole 
solution space only based on a finite number of sampled solutions? This issue has generated broad interests across a 

■ variety of different disciplines such as computer science, probabilistic reasoning, statistical physics and computational 
^ , biology P-[l0| . An efficient sample-based counting strategy was proposed in Ref . [s*] . This strategy successively sets 

I ■ the most balanced variable until an exact counter is feasible on the reduced formula, and provides a lower bound on 
' O ] the true count. Alternatively, we address the solution counting problem within the framework of inverse Ising problem 
in examples of diluted models — random if-SAT problems and fully-connected model — binary perceptron, and show 
that our method yields an estimate whose value could be very close to the true count when the constraint density is 
small. The constraint density is defined as the ratio of the number of constraints to that of variables in the system. 
The inverse Ising problem ll| has recently attracted much attention not only in the development of fast mean 
kJ . field inverse algorithms but also in modeling vast amounts of biological data 0, [T5l - [T7| . The pairwise Ising 

■ model is able to capture most of the correlation structure of the real neuronal network activity and is much more 
\ informative than the independent model where each neuron is assumed to fire independently 0, [isj . The observed 

■ collective behavior of a large neuronal network results from interactions of many individual neurons. The joint acti vity 
' patterns for a retina under naturalistic stimuli were reported to convey information about the visual stimuli [sl, '19']. 

' . Estimating the information stored by the real neuronal network directly from data remains an open and important 

■ issue. We show in this work the information can be estimated reliably and sizes of metastable states for the neuronal 
T— I , network can also be predicted. 

^-H • The paper is organized as follows. The inverse Ising problem is introduced in Sec.|lTl together with a brief description 
' of the susceptibility propagation algorithm used to infer the disordered Ising model. In Sec. IIIIl we present the belief 
propagation to estimate the entropy from the data and apply this method to predict the entropies of four different 
examples only from a limited number of samplings. Finally, the conclusion suggests some implications of our study 
as well as potential applications of the presented methodology. 



II. THE INVERSE ISING PROBLEM 



For a system of N variables, one can collect P configurations or solutions {cr^ }(i = 1, . . . ,N\v — 1, . . . , P) either 
from real biological experiments (e.g., spike trains in multi-electrode array recordings [7]) or from random walks in 
the solution space of a model. We assume cji takes Ising-type value ±1. The task of the inverse Ising problem is to 
find couplings {Jy} and fields {hi\ to construct a minimal model 
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such that its magnetizations and pairwise correlations are compatible with those measured, i.e., {o'i)ising = 
('''«)data' ('''i'''i)ising ~ i'^i'^i) data' ^ partition function and the inverse temperature ^ = 1 as it can be ab- 

sorbed in the strength of couplings and fields. Hereafter, we define the measured magnetization and connected 
correlation as rrii = (o'i)(ia,ta ^^'^ = i'^i'^j) data ~ "^iTOj respectively where (■ • ■ denotes the average over the 
sampled configurations or solutions. 

We use susceptibility propagation (SusProp) to infer the couplings and fields. SusProp passes messages along 
the oriented edges of the network by iterative updating. To run SusProp, two kinds of messages are needed. One 
is the cavity magnetization of variable i in the absence of variable j denoted as mi^j; the other is the cavity 
susceptibility gi^j^k that is the response of cavity field of variable i without variable j to a local perturbation of 
external field of variable k The update rule can be derived using belief propagation Eq. @ and fluctuation- 

response relation J^, .2^ [2l| and reads as follows (2l| : 



rrii — rrij^^i tanh Jij 
1 — rriimj^i tanh Jij 



(2a) 



gi^j,k^5ik+ 7 — ^ tanh Jfi^i^i (2b) 

^ . 1 - (mi^itanh Jh)-^ 

J-w ^ 1 log f {^ + C,j){^-m,^,m,^i) \ 

2 ^\^(l-C,,)(l + m,^,m,^,); ' ' 
a, = - (1 - rn^)9^^,^J ^ ^^^^ (2d) 

where di\j denotes neighbors of variable i except j, 5ik is the Kronecker delta function and e S [0,1] is introduced 
as a damping factor and should be appropriately chosen to prevent the absolute updated tanh(Jij) from being larger 
than 1. In practice, all couplings are initially set to be zero and for every directed edge of the network, the message 
rrii^j is randomly initialized in the interval [—1.0, 1.0] and Qi^j^k = if i fc and 1.0 otherwise. The SusProp rule 
Eq. ^ is then iterated until either the inferred couplings converge within a predefined precision 77^^^ or the preset 

maximal number of iterations Tmix is exceeded. After the set of couplings is obtained, the fields are inferred via 
hi = tanh'^TOi) - Z^jgaitanh"^ [tanh Jumi^i]. 

To ensure a reliable estimate of the parameters, we define a convergence fraction R as the ratio of the number of 
converged couplings to the total number of edges in the network. In the non-convergent case, we take the inferred 
parameters corresponding to i?max = max{_Rf , t = 1, . . . , Tmax} where Rt is the convergence fraction of t-th iteration, 
^max = 1.0 if the update rule converges. For an inverse problem, {mi,Cy } serve as inputs to the update rule, and 
they are computed from P sampled solutions or configurations. We use stochastic local search algorithms to sample 
the solution space of random K-SAT {K = 2,3 here) formulas and that of the binary perceptron. For the retinal 
network, the configurations were obtained from the spike trains in the multi-electrode recording experiments (data 
courtesy of Gasper Tkacik, Refs. 0, Hi)- 

III. ESTIMATING THE ENTROPY FROM THE DATA 

We derive the entropy of the constructed Ising model Eq. ([1]) under Bethe approximation (also called cavity 
method 0|) assuming sufficiently weak interactions among variables. We compute the entropy through site contri- 
butions ASi and edge contributions ASij as S'ising = ^sising — J2i^^i ~ ^(ij) ^"^(u)' 
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AS,=\ogZ,~^ 



hic'^^ IT cosh Jh(1 + tanh Jiiini-fi) — hie cosh J;i(l — tanh Jumi- 

ledi ledi 



' ^Jh sinh Jh(1 + tanh Jumi^j) + Ju cosh Jh;(l - tanh^ J;Om;^ij • JJ^ cosh (1 + tanh Jy-mj^i) 

l&di j&di\l 

+ e~''' |^J;i sinh J;i(l — tanh JHTO;-!.i) — Ju cosh Jh;(l — tanh^ Jh;)m;^,;j • cosh Jij(l — tanh J^mj^i) 

ledi j£di\l 

(3a) 

AS, -\ogZ, J, tanh J,j +m,^j-mj^, 
^^-'^ '■^ 1 + tanh JijUii^jirij^i 

where di\l denotes neighbors of variable i except /. Zi = e'^' HieSi cosh Jii{l + rhi^i) + e^'*' Hzeai cosh Jh(1 — rhi-^i) 
and Zij — cosh Jij (1 + tanh Jijnii^jmj^i). The cavity magnetization mi^j obeys simple recursive equations: 

^ e^'' ^^g^^\J-(l + - e'^' n;ga»\3(l - 

TO/^i = tanh Jumi^i (4b) 

We first randomly initialize mi^j G [—1-0, 1.0] for every directed edge of the reconstructed network, then iterate 

Eq. ^ until all messages converge within the precision rj^'^^ or the maximal number of iterations TmiL is reached. 
From the fixed point, the entropy can be computed via Eq. ([3]). The case where some variable, say i is positively frozen, 
i.e., corresponding measured magnetization rrii — 1.0, can also be handled. In this case, hi = +oo, and ASi is reduced 

to be logZj - where Z^ = Yll^g^coshJu {1 + tanh Jumi^i) and Z^ = J2i£di -^ii sinh J;j(l + tanh Jumi^i) + 

Ju cosh Jii(l — tanh'^ Jii)mi^i ■ Y[jedi\i cosh Jij(l + tanh Jijirij^i). The edge contribution remains unchanged. The 

negatively frozen case is similarly treated. In numerical simulations, we adopt 77nax = 500, r/^' = 10~^, TmsL = 2000. 
?7^^) as well as e depends on the following specific applications. 

We remark here that Eq. ^ is used specifically for the solution counting problem where we now have known the 
magnetizations and correlations and additionally some frozen cases (some rrii = +1 or —1) should be treated. On 
the other hand, the coupling or field distributions depend on the collected data and Eq. is derived only under the 
weakly-coupled approximation but the fully connected topology is reserved. The first point is, the entropy we try to 
estimate is not only for two-body interaction system (e.g., random 2-SAT) but also for three-body interaction system 
and densely- interacted system (e.g., the binary perceptron where each constraint involves all variables of the system). 
The second point is, the sampled solutions come from the zero energy ground state and the sampling process is always 
confined in a single cluster (solutions in it are connected with each other by single variable flips). 

All underlying parameters of pairwise Ising model are predicted directly from the observed data and the entropy 
of the original model is estimated based on the constructed Ising model. We emphasize here that two layers of 
approximations are made. The first one is the disordered Ising model Eq. ^ is used to approximate the original model. 
When estimating the entropy from the data, we actually do not know the original model. The second layer is we use 
mean- field methods, specifically the message passing algorithms to infer the underlying parameters of the pairwise Ising 
model. Since the computational complexity of SusProp is 0{N^) for the fully-connected network, we focus on small size 
networks with N of order 0(10^). When the constraint density is small, the efficiency of our methodology is supported 
by two concrete examples: random K-SAT problem and the binary perceptron. For these two examples, we use Struo to 
represent the entropy density computed by belief propagation with the knowledge of the original model (for details, see 
Rcf. "23] for random iiT-SAT problem and Ref. [Ij] for binary perceptron). To show the efficiency of the pairwise Ising 
model, we also compute the independent entropy Sind = A^Sind = — X)i [ log -|- log assuming 
P{cr) ~ Yii ^ii'^i)- For retinal network, we could neither know the true model underlying the network nor get 
the true value for the entropy (when the network is large). Therefore we just compare the result obtained by our 
current fast belief propagation with that obtained by time-consuming Monte Carlo method and show that the belief 
propagation not only reproduces the entropy value evaluated by Monte Carlo method but also yields rich information 
about the metastable states which are relevant for neuronal population coding [l^. In this case, we denote sbp 
as the entropy density estimated by belief propagation and smc estimated by Monte Carlo method. Note that both 
belief propagation and Monte Carlo method under the reconstructed Ising model yield approximate value for the true 
entropy since we consider only up to second-order correlations in the observed data while the system may develop 
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FIG. 1: (Color online) The quality of the pairwise and independent models versus constraint density a. (a) Entropy density 
difference versus a. The data points connected by dashed line are the differences between Sind and Struc, while those connected 
by solid line are the differences sising — Strue- The number of variables N = 100 for random A"-SAT (r2-SAT or r3-SAT) 
problem and 101 for binary perceptron (bperc). Struc is computed with the knowledge of the original model using belief 
propagation [23l . [23 |. Each point represents the average over eight random samples, (b) Scatter plot comparing sising with 
Strue- The fuU line indicates equality. 

higher-order correlations in its solution space or energy landscape. The different natures of these examples imply 
wide applications of our methodology to evaluate the entropy of an unknown model with only a limited number of 
samplings. 

A. Random isT-SAT problem 

The random iT-SAT problem is finding a solution (an assignment of N boolean variables) satisfying a random 
formula composed of logical AND of M constraints [251]. Each constraint is a logical OR function of K randomly 
chosen distinct variables (either directed or negated with equal probability). The constraint density a — M/N. For 
K = 2, the threshold separating a SAT phase from an UNSAT phase was confirmed to be = 1 below which 
the solution space is ergodic and a simple local search algorithm can easily identify a soluti on [25| . For K = 3, 
the estimated threshold as — 4.267 below which the solution space exhibits richer structures [2g|. The dynamical 
transition point locates at ad — 3.86 and separates the ergodic phase from non-ergodic phase. We use SEQSAT 
algorithm of Ref. [13] to first find a solution for a given a, then 10^ single variable flips are performed in the current 
solution space, after that we perform random walks in the current solution space to sample one solution every 10* 
steps. Each step involves N attempts to move from one solution to its adjacent one by single variable fiip, i.e., a 
randomly chosen variable is flipped and if the new configuration is a solution, the flip is accepted with probability 
1/2; otherwise the movement is rejected. We sample totally P = 10^ solutions to estimate the entropy. We choose 
e = 0.128,77(1) = 10-3 for random 2-SAT and e = 0.002,77(1) = lO""* for random 3-SAT. Resuhs are reported in Fig.Hl 
When a is small, our method can predict the true entropy very well especially for K = 2 which can be actually 
transformed into a pairwise Ising model. As a increases, the difference between sising and struc ^23] becomes large and 
this deviation is more obvious for K = 3, which manifests the presence of higher-order correlations in the solution 
space. At high a (e.g., a — 3.0 for K — 3), the belief propagation Eq. (|4]) would yield multiple fixed points. This 
signals ergodicity breaking phenomenon in the energy landscape of the constructed Ising model or indicates that long 
range correlations develop in the original system Q , although our samplings are still confined in a single cluster of the 
original model, as a result, the predicted entropy becomes rather inaccurate compared with the true one computed 
under the original model. 

B. Binary perceptron 

The binary perceptron with N binary weights connecting TV input nodes to a single output node performs a 
random classification of aN random binary patterns {Ci'K* = 1,...,A'^;/U = l,...,aA'^). The critical constraint 
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FIG. 2: (Color online) (a) Histograms of inferred couplings and fields (inset) for the retinal network with the number of neurons 
= 40 (data courtesy of Gasper Tkacik, Refs. 7, 8]). To infer the network, we choose e = 0.128, r;'^' — 10~'^. The network is 
inferred at -Rmax ~ 0.959. (b) Reconstructed using belief propagation Eq. ^ versus the measured one. We only show the 
case i ^ j since = 1 — (mf^)^. The full line indicates equality. 

TABLE I: Estimated entropy density sbp for the inferred retinal network through belief propagation Eq. Q. qo — mf^irii 
where is computed from the fixed point of belief propagation Eq. Q. The self-overlap qi — mf ~ 0.906828. The last 

column gives the probability of appearance for each fixed point during 1000 runs of belief propagation with the same inferred 
parameters. 



SBP 


qo 


prob. app 


0.09771 


0.906816 


0.089 


0.1093 


0.512411 


0.075 


0.1224 


0.313566 


0.066 


0.1634 


0.281124 


0.400 


0.1765 


0.021960 


0.354 


0.1918 


0.264979 


0.016 



density as — 0.83 below which the solution space is non-empty (28j . Given an input pattern if the actual output 
o'^ = sgn ^X^iLi ^j^f ) is equal to the desired output Oq assigned a value ±1 with equal probabilities, the configuration 
(T learns this pattern. The solution space of the binary perceptron consists of all configurations learning aN random 
patterns. Before sampling, we first learn aN patterns using DWF algorithm of Ref. [29|. The sampling procedure is 
the same as that used for random K-SAT problems. In numerical simulations, we choose e = 0.001, = 10-^. The 
deviation of estimated sising from Strue [HI is plotted against a. For small a, our method can predict the true entropy 
well without the knowledge of the original model. The large deviation shown in Fig. [T]at high a implies higher-order 
correlations start to dominate the solution space. 

C. Retinal network 

A recording of the activity of 40 neurons in a salamander retina under natural movie stimuli could also be analyzed 
within the current setting. The total effective number of samplings P ~ 7 x 10^ [8]. Our estimated entropy density 
for the retinal network is sbp — 0.09771 consistent with that obtained by Gasper Tkacik et.al [sj using Monte Carlo 
method which produces an estimate smc — 0.09479 but is rather time consuming for large N. Our result implies 
that the retina under the naturalistic movie stimuli stores e^*^'' ~ 50 effective configurations. If we approximate 
the true entropy using that calculated by belief propagation (Eq. ([3]) and Eq. or Monte Carlo method, then the 
multi-information measuring the total amount of correlations in the network 30] Ibp = Sind — sbp or Imc = Sind — smc 
where Sind is the independent entropy density. The result is that the difference between these two multi-information 
values Imc — Ibp — 0.00292. The histogram of inferred parameters is shown in Fig. [2] (a). Note that most of 
predicted couplings concentrate around zero value with a long tail of distribution for large negative couplings whose 
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weights are rather small. Most of the predicted fields are negative since most of the neurons are silent across the 
movie presentations. Using the same inferred parameters, we run belief propagation Eq. Q 1000 times from different 
random initializations. Several fixed points are found and one of them is consistent with the previous result Q (see 
Table m. These fixed points represent different metastable states and the entropy measures the capacity of neurons to 
convey information about the visual stimulus which contains high-order correlation structure. The visual information 
could be encoded by identity of the basin of attraction 19] and these predicted metastable states may code for specific 
stimulus features. Therefore, the information of the inputs to the retina can be stored in the couplings and fields 
which generate a free energy landscape with multiple metastable states for redundant error correction [19]. Future 
research on neuronal population coding needs to elucidate this point. 

In Fig. [2] (b), we verify that the inferred pairwise Ising model reproduces the measured connected correlations 
with very goo d ag reement. The reconstructed correlations {C|^^} can be computed by the following message passing 
algorithm [Hi, [Hj] : 

rhi^i = tanh Jumi^^ (5b) 



gi^j^k = Sik+ -. 7 — ta.nhJugi^ik (5c) 

^ 1 — (nii^i tanh Jh) 

ledi\j ^ ' 

where two kinds of messages, rrii^j and gi-^j.k are updated. Once both messages for each directed link in the network 
are converged, i.e., iteration of Eq. ([5]) reaches fixed point, we compute the predicted connected correlations {C^?^^} 
via 



C^^ = (Cij - m^mj)gj^ij + (1 - m'^)gi^jj (6a 
1 + tanh Jam 



~ tanhJij+nii^jnij^i 



where the Ising model is known and all messages needed to compute Cjj^ including and nij are read from the fixed 
point. This message passing strategy to evaluate the correlations is very fast and takes tens of iterations to converge. 
Remarkably, the estimated magnetizations and correlations fit those measured very well. However, using Monte 
Carlo samplings to reconstruct the correlations, we failed to reproduce those measured. For example, after sufficient 
thermalization, the configuration is sampled every 10'* Monte Carlo sweeps, then the correlations are computed with 
10"^ sampled configurations. The obtained root mean square error is about 0.23. Possible reason is, the reconstructed 
Ising model by SusProp already develops multiple states (different fixed points of belief propagation Eq. Q), therefore, 
at temperature T = 1.0 and N = 40, as observed in our simulations, Monte Carlo samplings can have transitions 
between different states yielding the average energy density of sampled configurations e ~ —3.7119 with fluctuation of 
order 0.0724 while the state reproducing the measured correlations in Fig. [2](b) has typical energy density —3.62954 
but smallest entropy density sbp — 0.09771 of all observed states. Actually, in this case, the correlation computed 
by Monte Carlo samplings corresponds to the average over different states while the measured data comes from one 
state of the reconstructed Ising model. 



D. Convergence patterns and entropy density difference versus P 



There exist three kinds of convergence patterns for different iterations of SusProp rules. One is the convergence 
case shown by an example of random 2-SAT with a = 0.7; the second type is the convergence fraction R first increases 
then decreases (see in Fig. [3] an example of binary perceptron with a = 0.5); the last type is R reaches a plateau with 
small fluctuations shown by an example of retina. 

In Fig. |31[a), the influence of the number of samplings P on the estimation of entropy from the finite samplings 
is shown. It seems that the entropy density difference decreases as P becomes small. Note that we construct the 
pairwise Ising model based on the samplings from the ground state (zero energy for these three constraint satisfaction 
problems) and the underlying graphical model (e.g., r3-SAT or bperc) may not be a pairwise Ising model. Our 
samplings are always confined in a single solution cluster and the quality may depend on the fine structure of the 
solution space [34-(34| . This case is different from studies on the reconstruction of Sherrington-Kirkpatrick model at 
high temperatures [20]. In our current setting, when the sampling number P decreases, the collected data seems to 
have more correlations (this may be induced by the statistical errors) and the SusProp turns out to be not converged 
any more (e.g., in the case of r2-SAT with a = 0.3), which can be justified from the Fig. Sfb) that the inferred 
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FIG. 3: (Color online) Convergence patterns possibly appearing in the iteration of SusProp update rules. 




p coupling 

FIG. 4: (Color online) (a) Entropy density difTerence versus P. The error bar shows the fluctuation across eight random 
samples, (b) The distribution of inferred couplings for r3-SAT with a = 1.0 and A'^ = 100. Two results for P = 500 and 
P = 12500 are shown. 

coupling distribution becomes broader with decreasing P and thus the estimated entropy should take smaller values. 
As observed in Fig. SJ^b), once P decreases down to some value, the estimated entropy value would underestimate 
the true one computed with the knowledge of the original model. On the other hand, given small P, the computed 
magnetizations and correlations would have large statistical errors and may not correctly reflect the correlations in the 
sampled solution cluster. Safely, we select P — 10^ to reduce the statistical error for calculating the magnetizations 
and correlations. 



IV. CONCLUSION 

In this work, we address the important problem of counting the total number of solutions (configurations) based on 
a limited number of sampled solutions (configurations). We formulate the solution (configuration) counting problem 
within the framework of inverse Ising problem, and this idea is tested on both diluted models and fully-connected 
models, as well as on real neural data. In the first case, we do not know a priori the underlying graphical models 
and try to construct a disordered Ising model from the collected data to evaluate the entropy of those unknown 
models. Note that the sampled solutions come from the zero energy ground state (single solution cluster) and the 
number is limited to P but the size of that sampled cluster is evaluated. To this end, the pairwise model improves 
substantially the independent model (see Fig.[lja)). When the constraint density is small then the pairwise correlation 
dominates the solution space, the estimated entropy gets very close to the true one estimated with the knowledge 
of the original model. Another interesting point to be demonstrated in our further work is, for small N, one can 
compute the magnetizations and correlations through exact enumeration and further the real entropy value. The 
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result of Boltzmann learning algorithm fu\ and Monte Carlo simulation \§\ (using a large enough amount of Monte 
Carlo samplings) should provide an upper bound on the true entropy. Instead, using the approximate method we 
proposed, whether the obtained result provides a bound should be checked for small size system or proved in the limit 
of large N, provided that the magnetizations and correlations are less noisy. In the second case, the susceptibility 
propagation is applied to infer the retinal network and belief propagation is used to reproduce the entropy computed 
by Monte Carlo method. This message passing scheme is very fast and efficient especially for large network and 
the observed multiple fixed points predict other metastable states in the inferred retinal network. These metastable 
states may have intimate relation with the neuronal population coding .19] . Extensions to the neuronal interaction 
network organized in a hierarchical and modular manner would be very interesting [ssl . l36j . Our presented framework 
constructs a statistical mechanics description of the system directly from either artificial data or real data, and has 
the potential to describe biological networks more generally and estimate the size of the solution space in various 
contexts especially when pairwise correlation dominates the system. 
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